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ABSTRACT 

The 3D higher order organization of chromatin 
within the nucleus of eukaryotic cells has so far 
remained elusive. A wealth of relevant information, 
however, is increasingly becoming available from 
chromosome conformation capture (3C) and 
related experimental techniques, which measure 
the probabilities of contact between large numbers 
of genomic sites in fixed cells. Such contact 
probabilities (CPs) can in principle be used to 
deduce the 3D spatial organization of chromatin. 
Here, we propose a computational method to 
recover an ensemble of chromatin conformations 
consistent with a set of given CPs. Compared with 
existing alternatives, this method does not require 
conversion of CPs to mean spatial distances. 
Instead, we estimate CPs by simulating a physically 
realistic, bead-chain polymer model of the 30-nm 
chromatin fiber. We then use an approach from 
adaptive filter theory to iteratively adjust the param- 
eters of this polymer model until the estimated CPs 
match the given CPs. We have validated this method 
against reference data sets obtained from simula- 
tions of test systems with up to 45 beads and 
4 loops. With additional testing against experiments 
and with further algorithmic refinements, our 
approach could become a valuable tool for re- 
searchers examining the higher order organization 
of chromatin. 

INTRODUCTION 

Eukaryotic cells need to accommodate their long genomic 
DNA within a relatively small nucleus. This remarkable 
feat is accomplished through several levels of 3D spatial 
organization (1). The first level consists of wrapping the 
DNA duplex around octamers of histone proteins to form 
nucleosomes. The resulting string of nucleosomes is then 



folded into a thicker fiber known as chromatin. 
Subsequent levels of folding ultimately lead to the terri- 
torial arrangement of chromosomes within the nucleus. 
These additional levels of folding, referred to as higher 
order organization of chromatin, are not only essential 
for efficient DNA packaging but are also believed to 
play a role in several other biological processes. For 
example, the formation of chromatin loops facilitates 
interactions between distant portions of DNA and these 
interactions are essential for regulating transcription and 
recombination (2-5). Also, the transcriptional activity of 
genes tends to be inversely correlated with the spatial 
density of chromatin fibers (6-9). Furthermore, growing 
evidence suggests that spatially proximal regions of the 
genome are more likely to be functionally correlated, 
leading to the concepts of 'factories', 'globules' and 
'territories' (10-14). Unfortunately, owing to the limita- 
tions of current experimental methods in visualizing chro- 
matin in vivo, the 3D higher order organization of 
chromatin is not well understood. In particular, state-of- 
the-art microscopy approaches, such as fluorescence in situ 
hybridization (FISH) (15) and super-resolution fluores- 
cence microscopy (16), do not simultaneously provide 
the spatial resolutions and the measurement throughput 
necessary to discern and locate individual chromatin fibers 
within the nucleus. 

During the past decade, however, increasingly higher 
resolution and throughput have been achieved by a 
number of sophisticated experimental techniques — 
including 4C (17,18), 5C (19), GCC (20) and Hi-C 
(21) — that are based on the original method of chromo- 
some conformation capture (3C) (22). These techniques do 
not directly capture the 3D spatial organization of chro- 
matin. Instead, they measure the frequency of interactions 
between different fragments of genomic DNA in fixed cells 
(23). To detect such interactions, spatially proximal 
segments of DNA are covalently cross-linked by treating 
millions of intact nuclei with chemical agents, such as for- 
maldehyde. The DNA is then cleaved into small fragments 
by digestion with appropriate restriction enzymes. 
Next, the resulting pairs of cross-linked fragments are 
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enzymatically ligated and the cross-links are chemically 
removed. Finally, the ligation products are amplified by 
polymerase chain reaction and sequenced by high- 
throughput methods. Analysis of the sequences allows 
one to identify the pairs of fragments that were originally 
cross-linked. Counting the number of times that each pair 
was identified from the sequences yields a 2D map of 
contact probabilities (CPs) for the examined pairs of 
fragments. 

Although CP maps provide abundant information to 
help researchers infer the higher order organization of 
chromatin through theoretical and computational 
models (24,25), such a task is rather challenging. To 
tackle this problem, several approaches have already 
been proposed. Dekker et al. (22) presented the first 
such approach to deduce a coarse 3D structure for the 
320-kb chromosome III in NKY2997 cells. To obtain 
the structure, 78 CPs were measured by 3C and converted 
to spatial distances through a theoretical expression for 
worm-like chains (26). The resulting distances were pre- 
sumably used to solve a molecular distance geometry 
problem. Later, Fraser et al. (27) assumed the inverse pro- 
portionality relation d on l/p to calculate spatial distances 
d from hundreds of CPs p, obtained by 5C experiments on 
the HoxA gene cluster in THP-1 leukemia cells. The re- 
sulting distances d were then used as targets to optimize a 
piecewise linear curve representing the gene cluster under 
study. The same relation d oc l/p was used by Duan et al. 
(28) to infer the 3D structure of the budding yeast genome 
from over 65 000 CPs obtained by 4C. In addition, they 
modeled chromatin as a chain of beads, each representing 
10 kb of DNA, and defined various constraints to enforce 
known geometric and topological features of yeast chro- 
matin. Nonlinear constrained optimization methods were 
then used to find an optimal structure. Another full 
genome, that of fission yeast, was studied by Tanizawa 
et al. (29) using a Hi-C variant with enrichment of 
ligation products. To determine the 3D structure of this 
genome, the authors used a bead-chain model and a 
method similar to that of (28). This time, however, 
spatial distances were calculated from CPs through a cali- 
bration curve obtained by fitting a double exponential 
decay function to distance measurements obtained by 
FISH. 

A bead-chain model of chromatin was also employed by 
Bau et al. (12), who used 5C to analyze the 500-kb 
ENm008 domain (a-globin gene) on human chromosome 
16 in K562 cells and in GM 12878 cells. In this case, 
though, each bead represented a DNA restriction 
fragment, with bead radius proportional to fragment 
length. The beads interacted through harmonic restraints 
with strengths and equilibrium distances derived from ex- 
perimental CPs. A combination of optimization and clus- 
tering algorithms was then used to determine a 
conformation ensemble and corresponding centroid struc- 
ture for the ENm008 domain in each cell type. Again 
seeking conformation ensembles, Rousseau et al. (30) 
used a probabilistic approach to analyze 5C data on the 
142-kb HoxA cluster in THP-1 and HB-1119 cell lines, 
and Hi-C data from (21) on the 88.4-Mbp long arm of 
human chromosome 14. In particular, they applied a 



Markov chain Monte Carlo sampling method to 
generate ensembles of structures consistent with a poster- 
ior distribution of spatial distances between restriction 
fragment midpoints, where the distances were again 
obtained from experimental CPs by assuming an inverse 
power law relation. Another effort to obtain chromatin 
conformation ensembles, but without using a 
distance-CP relationship, was recently reported by 
Gehlen et al. (31). To generate such ensembles for the 
entire 5. cerevisiae genome, the authors performed 
multiple molecular dynamics simulations of a bead-chain 
polymer model and included within each simulation a 
randomly selected subset of intra- and inter-chromosomal 
interactions experimentally determined through GCC. 

Although the above computational approaches are re- 
markable in their ability to handle large numbers of inter- 
acting fragments, almost all of them rely on converting the 
measured CPs to spatial distances between interacting 
fragments. Such conversion is achieved by assuming a 
functional relation that describes the behavior of free 
linear chains. For example, polymer theory predicts that 
p cx d~ 3 for ideal random walk chains (32), whereas more 
elaborate relations have been derived for worm-like chains 
(33). These relations, however, may not be valid for 
polymers subjected to looping and other external con- 
straints. Also, several of the above approaches ignore 
the mechanical properties of the chromatin fiber or deter- 
mine only a single average structure from a given set of 
CPs, which are in fact the result of cross-linking events 
over an ensemble of chromatin conformations sampled 
from millions of cells. Finally, none of the above studies 
validate their proposed computational methods against 
known chromatin conformation ensembles. 

Here, we describe and validate a computational 
approach to obtain ensembles of chromatin conformation 
consistent with a given set of reference CPs. This approach 
does not require assuming a functional relation between 
spatial distances and CPs. Instead, we estimate new CPs 
by simulating a coarse-grained polymer model that ap- 
proximates the physical behavior of a 30-nm chromatin 
fiber. We then iteratively adjust the parameters of this 
polymer model until a good match is achieved between 
the CPs estimated from the simulations and those in the 
given reference set. The result is an 'optimal' ensemble of 
conformations that is most consistent with the given ref- 
erence CPs. Our initial validation of this approach against 
several simulated test systems produced good agreement 
of average spatial properties between reference and re- 
covered conformation ensembles. 



MATERIALS AND METHODS 

Our goal is to generate an ensemble of conformations 
consistent with a given set of probabilities of contact 
between different segments of a chromatin fiber. To 
achieve this goal, we propose a computational method 
that consists of three main components (Figure 1): (i) a 
coarse-grained polymer model approximating the physical 
properties of chromatin; (ii) a procedure to generate an 
ensemble of conformations for the polymer model and 
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Figure 1. Main components of the proposed computational approach 
to recover a conformation ensemble from a given set of reference CPs. 



(iii) a procedure to refine the parameters of the polymer 
model in such a way that the generated conformation 
ensemble is consistent with the given set of CPs. 
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Figure 2. Schematic representations of (a) restrained bead-chain 
polymer model used for BD simulations of a 30-nm chromatin fiber 
subjected to looping constraints and (b) application of the LMS algo- 
rithm to the optimization of the parameters in the general linear model 
(Equation 9) used to predict restraint spring constants from reference 
CPs. 



Coarse-grained polymer model of chromatin 

We assume that chromatin exists as a fiber with an average 
diameter of 30 nm, and that the conformation of this fiber 
is determined primarily by its stretching resistance, 
bending stiffness and excluded volume. To approximate 
these physical properties, we use a bead-chain model, with 
each bead representing a chromatin segment of 3-6 kb 
(34). A similar model was proposed by Rosa et al. (35) 
and was recently used to simulate the entire genome of 
budding yeast (36). Following Bau et al. (12), we also 
assume that the chromatin fiber is subjected to unknown 
external constraints, e.g. due to looping interactions or 
confinement, and that the average effects of these con- 
straints can be approximated by additional harmonic 
restraints connecting particular beads in the chain 
(Figure 2a). 

Thus, the potential energy U of the bead chain can be 
expressed as the sum of four terms, 

U= £/bond + £4end+£4xcl+^rest- (1) 

The first term E/b on d accounts for the chain's resistance 
to stretching and results from connecting adjacent beads 
with harmonic springs, 

JV— 1 j 

t/bond = ^is(rfi,/+l - do) , (2) 
i=l L 

where dy = \tj — r,| is the distance between beads i and j, 
N is the number of beads in the chain, k s is the spring 
constant, r, is the position vector for bead i, do = a is 
the equilibrium bond length and a = 30 nm is the unit of 
length used in our simulations (Table 1). 

The second potential energy term [/bend accounts for the 
chain's resistance to bending and results from subjecting 
each triplet of adjacent beads to a harmonic bending po- 
tential (37), 

N-2 , 

CW = j]=*^, (3) 

where 6, is the angle between the displacement vectors 
r /+ i-r,- and T i+2 - r !+ i, kg = k s TL p /a is an angular 
'spring constant', L p is the persistence length of the 
chain (38), k B is the Boltzmann constant and T is the 
absolute temperature. 



The third potential energy term, (7 exc i, accounts for the 
excluded volume, or effective thickness, of the chain and is 
treated using the repulsive part of the Lennard-Jones 
potential, 

(4) 

where e = k^T is the unit of energy in our simulations, 
a = 30 nm is the effective thickness of the fiber (Table 1) 
and ©(x) is the Heaviside step function, which equals 1 
when x > 0 and 0 otherwise. 

The last potential energy term, U rest , accounts for the 
presence of external forces or constraints that affect the 
shape of the chromatin fiber as well as the probability of 
contact between different segments of the fiber. In particu- 
lar, we assume that the average effects of these constraints 
can be reproduced reasonably well by including a suffi- 
cient number of harmonic restraints that connect a 
subset of the beads in the chain, 

C/rest = £ \kij\dij - <f , (5) 

where R is the set of pairs of beads connected by harmonic 
restraints and ky (d-,) is the spring constant (equilibrium 
distance) for the restraint connecting beads i and j. The 
actual members (;', J) of the set R and the corresponding 
values of ky and are adjustable parameters in this 
model of restrained chromatin. 

Generation of conformation ensembles 

To obtain optimal values for these adjustable parameters, 
we compare a reference set of probabilities of contact 
between the beads in the chain with a set of corresponding 
probabilities estimated from an ensemble comprising a 
large number of bead-chain conformations. 

Simulations of bead chain 

To obtain such an ensemble, we start by minimizing the 
potential energy of an initial conformation. To this end, 
we use the Polak-Ribiere modification of the conjugate 
gradient algorithm (39). Next, we equilibrate the 
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Table 1. Parameter values used to simulate the restrained bead-chain polymer model of chromatin and to provide a physically realistic approxi- 
mation of the mechanical properties of chromatin, as currently known from experiments 



Parameter 


Symbol 


deduced units 


■5T ii i*i ire 
ol lull La 


Thermal energy" 


k B T 


1.0 


4.1 x 10- 21 J 


Bead mass b 


m 


1.0 


7.0 x 10- 21 kg 


Lennard-Jones size parameter 


a 


1.0 


30 nm 


Lennard-Jones energy parameter 


e 


l.0k B T 


4.1 x 10- 21 J 


Bead separation 


do 


1.0(T 


30 nm 


Contact distance 0 


<k 


1.5 a 


45 nm 


Bond spring constant 11 


^ S 


500 k B T/a 2 


2.3 x lO^JnT 2 


Persistence length 6 


L P 


4.0 a 


120nm 


Bending energy constant 


ke 


4.0k B T/md 2 


1.7 x 10- 20 J rad~ 2 


Time step/damping constant 


At/y 


3.3 x \0- 4 a 2 m/k B T 


5.1 x 10- 19 s 2 



"Energy per bead per degree of freedom at T = 300 K. 

b Representative value based on the experimental measurement of 23.3 MDa for a 15.5-kb fragment of 30-nm chromatin upstream of the chicken 
/6-globin locus (42). 

"Following Rosa el al. (35), equivalent to assuming that contacts between chromatin fibers are mediated by proteins of 15-nm diameter. 
d From experiments, the stretching modulus is dok s ~ 5-150 pN (43), hence k s ranges from 1.7 x 10~ 4 to 2.5 x 10~ 2 Jm~ 2 . 
e From experiments, L p ~ 30-200nm (43). 

f To maximize conformation sampling efficiency, we used the largest value of At/y found to maintain stability of the BD simulations. A lower bound 
for At can be estimated by considering a chromatin sphere of radius ;■ = 15 nm and using y = (mr\r/m with the viscosity of water ij = 890 uPa s at 
25°C and 1 bar (44). Then, At ~ 18 ns. 



energy-minimized chain by performing 10 6 steps of 
Brownian dynamics (BD) simulation. We then perform 
an additional BD simulation during which we collect 
one conformation every 100 integration steps. The set of 
conformations collected from a single simulation trajec- 
tory constitute a conformation ensemble. 

To perform the BD simulations, we apply a second- 
order algorithm (40,41), which we simplify to neglect the 
effects of hydrodynamic interactions. Specifically, for each 
bead i, we calculate a tentative new position at time t+At 
using the position r,-(r) of the bead and the force f,-(?) on the 
bead at time t, 



At IknTAt 

Tlt+At) = r,(/)+ — fi(t)+J— N(0, (6) 

ym y ym 

where y — k^T/D s m is the damping constant, D s is the 
self-diffusion coefficient, A^ is the integration time step, 
m is the mass of each bead and N(?) is a random displace- 
ment vector whose components are normally distributed 
with mean 0 and variance 1. Next, we use the tentative 
bead positions to calculate a tentative new force f,-(r+A?) 
for each bead i. Then, for each bead i, we calculate a more 
accurate position using the tentative position and tentative 
force at time t+At and the force at time t, 

rfc+At) = r-(H-A0+^[-f.(/)+&*f A/)]. (7) 
2ym 

Finally, these latter bead positions are used to calculate 
more accurate forces ij(t+At) at time t+At. 

Estimation of CPs 

To estimate the bead CPs from an ensemble of bead chain 
conformations, we analyze each member of the ensemble 
and check for the occurrence of contacts within all 
possible pairs of beads in the chain. Following Rosa 
et al. (35), a contact between two beads is defined to 
occur whenever the distance between the beads is less 



than a predefined 'contact' distance, d c (Table 1). Hence, 
we estimate the probability of contact p t j between beads i 
and j by calculating the proportion of conformations in 
which a contact occurs between those beads, 

^ = ^E 0 ^-<), (8) 

where N c is the total number of conformations in the 
ensemble and d\ ■ is the distance between beads i and j in 
conformation / of the ensemble. 

Reflnement of model parameters 

In this work, we assume that a set of reference CPs, 
denoted p*j, is available for 1 < i <j < N, and the 
problem is to find an ensemble of bead-chain conform- 
ations consistent with those CPs. To this end, we need 
to optimize the adjustable parameters of the bead-chain 
model so that simulating such a model yields a conform- 
ation ensemble whose estimated CPs pij match as closely 
as possible the corresponding reference CPs p*, for 
1 < i < j < N. 

The adjustable parameters to be optimized are the pairs 
of indexes (ij) e R and the values of fcy and cf-, for each 
To begin to tackle this complex problem, we choose 
to reduce the number of adjustable parameters by fixing 
the members (/', j) of the set R at the start of the optimiza- 
tion procedure and by using zero as the equilibrium 
distance for the harmonic restraints, i.e. 
dj. = 0,V (ij) e R. Although cf-, = 0, excluded volume 
interactions (Equation 4) prevent beads from overlapping. 

Placement of harmonic restraints 

To determine the pairs (ij) e R of beads that must be 
connected by harmonic restraints, we analyze the given 
set of reference CPs p*,, for 1 < i <j < TV, by using a 
peak detection algorithm. Specifically, we construct a 
smooth surface z = g(x,y) such that g(ij) ~ p*, for 
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1 < i < j < N. Each peak on this CP 'surface' corresponds 
to a pair of beads that interact more frequently than their 
neighbors. Thus, we find the pair of integers (i, j) closest to 
the location (x p ,y p ) of each peak in the CP surface, and we 
add (/, j) to the set R of pairs of beads connected by 
harmonic restraints. To find the location (x p ,y p ) of each 
peak, we slice the surface at every point (/,/), for 
1 < i < j < N, using the four vertical planes x = i, y = j, 
x + y = i +j and x — y = i — j. Next, we find the local 
maxima of the curve generated by each slice. If the 
curves on all four slices have a local maximum close to 
( x p J^) ^ (vX then we deem (x p ,y p ) to be the location of a 
peak on the CP surface. 

Optimization of restraint spring constants 
The remaining group of adjustable parameters are the 
spring constants kjj that determine the strength of the 
harmonic restraints on bead pairs (/,/) e R. To predict 
these spring constants from the known reference CPs, 
we use the general linear model 

k* = Wp*. (9) 

Here, k* is a vector containing n predicted spring con- 
stants kjj of the harmonic restraints, where n is the 
number of bead pairs in R\ W is an n x (n+l) matrix of 
model parameters and p* is a vector containing n+l 
elements, where the first n elements are the reference 
CPs p*j for the pairs of beads connected by the n 
harmonic restraints, and the last element is a non-zero 
constant c that allows W to map the background CPs of 
an unrestrained chain to zero spring constants. As c is 
multiplied by appropriate weights, its value can be arbi- 
trary. To minimize roundoff errors, however, we use 

Now the problem of finding optimal values for the 
spring constants kjj becomes a problem of determining 
the optimal elements of the matrix W. This is not a 
trivial problem, because each spring constant kjj may in 
general affect not only the CP for the pair (i, j) of beads 
connected by that spring but also the CPs for other pairs 
of beads in the chain, including those connected by other 
restraints. Also, because Equation 9 is an approximation, 
the optimal W will not necessarily yield valid spring con- 
stants kij when p* changes. Thus, in general, an optimal 
W must be determined for each given p*. 

One could argue that predicting the k t js through 
Equation 9 is an unnecessary complication, because 
optimal kyS could be found more simply by using a 
standard optimization algorithm that adjusts the kjjs to 
minimize the sum of squared differences 
J2(ij)eR(PiJ ~ Pij) • We did not pursue such a blind 
approach, however, because we suspected that it would 
be less efficient than alternative methods that take advan- 
tage of additional information about the underlying 
physical system. Such information, in the proposed 
approach, is the hypothesis that there exists an 'inverse' 
system that converts the CPs to spring constants accord- 
ing to the general linear model of Equation 9. 

To find optimal elements for W in Equation 9, we apply 
the least mean squares (LMS) algorithm developed by 



Widrow and colleagues (45-47) (see the Appendix). This 
simple yet powerful algorithm has been extensively used in 
the field of adaptive signal processing to optimize a digital 
filter structure known as adaptive linear combiner (ALC). 
An ALC performs a dot product between a time-varying 
weight vector w/t and a time-varying input vector x k , thus 
obtaining a scalar output y k = w/, T x /o which is required to 
approximate a given desired signal <& at each discrete time 
step k. To meet this requirement, the LMS algorithm uses 
a steepest descent scheme that iteratively adjusts the 
elements of the weight vector w/ f at each time step k using 

Wfe+l = W k +2^£ k X k , (10) 

where s k = <& — y k is the error at time step k and fi is a 
gain factor that affects the speed of convergence and the 
stability of the algorithm. 

To apply the LMS algorithm toward the optimization 
of the parameter matrix W in Equation 9, we allow this 
matrix, the CPs and the predicted spring constants to vary 
with iteration index k, i.e. k k = W k p k . We then treat the 
elements of k k and the rows of W k as the outputs and 
transposed weight vectors, respectively, of n ALCs, 

k /> = [>'U J'2,A- ■ ■ -n,k] T , (11) 

W ft = [wi,* w 2 ,a- . . . w„ >/f ] T . (12) 

To complete this application of the LMS algorithm, 
we must provide appropriate inputs to the ALCs and 
obtain appropriate errors, which are necessary to adjust 
the weight vectors. To obtain an input vector x k for all 
ALCs, we first predict a set of restraint spring constants 
using the parameter matrix available at iteration k 
and the constant vector of reference CPs (first block 
in Figure 2b), i.e. k* k = W^p*. Next, we use this set of 
spring constants to generate, through BD simulations, 
an ensemble of bead-chain conformations, and we use 
this ensemble to estimate the CPs for the restrained bead 
pairs (ij) e R (second block in Figure 2b). The resulting 
vector p /f of estimated CPs is now used as input 
for all n ALCs, which produce a corresponding out- 
put vector k>. = \Vfcp\- at iteration k (third block in 
Figure 2b). If the weights of the ALCs were optimal, 
then the ALC outputs in k k at iteration k would be very 
close to the spring constants in k£, which were used to 
generate the ensemble of conformations from which p k 
was estimated. Therefore, the ALC errors are the 
elements of the vector e k = kjt — k/ f , which we can finally 
use to compute a better estimate of the parameter matrix 
for the next iteration, 

W k+l = W /f +2 Wf s A .pJ. (13) 

To ensure stability of the LMS algorithm, we need a 
gain factor fj. < l/tr[R], where tr[R] is the trace of the 
input correlation matrix (47). Thus, to calculate a safe 
value for fi, we let /x = 0.1/tr[R] and we use the 
approximation 

tr[R]«^[pJp A -+P* T p*], (14) 
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where we account for both current and reference CPs in 
order to decrease /i when p k is large and to bound /x from 
above when p /f is small. To determine the first vector of 
predicted restraint spring constants k*, we assume a linear 
relationship between each kg and the corresponding 
reference CP p*,. Specifically, we set kg = ao+ci\p*j for 
(ij) e R, where a 0 = 2- 2p* m J(p* mail - p* min ), a x = 2\ 

«ax-/4n) and Z'minCPmax) is the minimum (maximum) 
value of the reference CPs p*, for {ij) e R. This choice 
yields initial spring constants ranging from 20 to 40% of 
the maximum value of 10 that we allow kg to take. Then, 
to begin the restraint optimization procedure with the first 
vector of estimated CPs p 1; we set the diagonal elements of 
Wi equal to 1 and all other elements equal to 0. 

Selection of optimal ensemble 

After a sufficient number of iterations, the restraint opti- 
mization procedure described above should yield a set of 
predicted spring constants kjj that produce a good match 
between the CPs estimated for the 'restrained' bead pairs 
and the corresponding reference CPs, i.e. pg » p*, for 
(ij) e R. Our goal, however, is to generate an optimal 
ensemble of bead-chain conformations such that the CPs 
estimated for 'all' bead pairs, not just the restrained ones, 
closely match the corresponding reference CPs, i.e. we 
want pij ~ p*j for 1 < i < j < N. To quantify the 
goodness of match between estimated and reference CPs, 
we calculate the root mean-squared deviation (RMSD) 
between the two sets of probabilities, 



P^u = ^ ] ^^ ) ZjPu-P^ CB) 

To find the set of restraint spring constants that 
minimize prmsd, we perform 40 iterations of the LMS 
algorithm (Equation 13) during the restraint optimization 
procedure described above. To accelerate these iterations, 
we perform only 5 x 10 6 steps during the BD simulations 
from which the CPs are estimated at each iteration. In 
general, the conformation ensembles produced by such 
short simulations will depend on the initial bead-chain 
conformation used for the BD simulations. Therefore, to 
find the ensemble that minimizes /Jrmsd> we perform 
several trials of the restraint optimization procedure. In 
each trial, we use a different initial conformation for the 
simulations, and we identify the set of restraint spring 
constants that minimize ^rmsd among all iterations per- 
formed. Next, these optimal spring constants and the cor- 
responding initial conformation are used to generate a 
larger conformation ensemble, this time by performing 
10 steps of BD simulation. Among the larger conform- 
ation ensembles obtained from all trials, we select the one 
that yields the smallest /?rmsd- This final ensemble is the 
one we deem to be optimal, i.e. most consistent with the 
reference CPs. 

Generation of initial conformations 

To obtain the different initial bead-chain conformations 
used for each trial of the restraint optimization procedure, 
one could simply generate a number of random con- 
formations. We choose, however, a more deterministic 



approach aimed at generating conformations with differ- 
ent relative orientations of loops. Specifically, we design 
each initial conformation in the shape of a tight cylindrical 
bundle (Figure 6). To generate the bundle, all the beads 
connected by harmonic restraints are arranged on a circle 
whose circumference is just large enough to prevent 
overlapping those beads. Next, the intervening fragments 
that contain the other beads of the chain are used to 
connect the beads on the circle. As they join the beads 
on the circle, these fragments are forced to run perpen- 
dicular to the plane of the circle. Hence, there are two 
ways in which each fragment can connect two adjacent 
beads on the circle: on the same side of the plane of the 
circle, or on opposite sides. By connecting the beads on the 
circle with the intervening fragments in all possible ways, 
we can generate up to 2" c ~ 1 distinct conformations, where 
n c is the number of beads on the circle and where we omit 
those conformations that result from reflecting other con- 
formations about the plane of the circle. In the present 
study, we selected up to 32 different bundle conformations 
to perform the trials of the restraint optimization proced- 
ure (Table 2). 



RESULTS AND DISCUSSION 

Test systems 

To validate our computational method, we considered six 
test systems of increasing complexity. Each test system 
consisted of the same bead-chain model that we used to 
recover an optimal conformation ensemble from reference 
CPs. In each such system, however, we induced the for- 
mation of specific loops by connecting appropriate beads 
with up to eight harmonic restraints (Supplementary 
Table SI). To vary the complexity of these test systems, 
we varied the number of beads in the chain and the 
number of induced loops (Table 2). In particular, we 
simulated chains of 25, 35 and 45 beads with 2, 3 and 4 
'free' loops, respectively. To mimic the effects of confine- 
ment constraints, we also simulated the same chains with 
additional restraints connecting the middle beads of free 
loops across such loops, as shown schematically in 
Supplementary Table SI, thus giving rise to 'tied' loops. 

We used the same value of kg — k s /200 for the spring 
constants of all restrained bead pairs (i, j) in all test 
systems. The conformations of these test systems 
obtained after minimizing their potential energy are 
shown in Figure 3. 

Reference CPs 

To obtain reference sets of estimated bead CPs p*-, for 
1 < i < j < N, in each of the six test systems, we generated 
corresponding ensembles of bead-chain conformations by 
performing BD simulations following the same protocol 
described above for the ensemble recovery procedure. In 
particular, for each test system, we constructed an initial 
bead-chain conformation by threading the appropriate 
number of beads into the path of a 3D Hilbert curve 
(21). We then minimized the potential energy of the 
initial conformation (Figure 3), equilibrated the system 
with 10 6 simulation steps and performed 10 8 additional 
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Table 2. Validation of the conformation ensemble recovery procedure using reference CPs estimated by simulating test systems of increasing 
complexity 

Test system" Ensemble recovery b 

RMSD C 



2 Parameters n + 1 Parameters 



Figure 3 d 
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0.050 


0.0015 


0.020 


0.036 


0.0016 


0.020 
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25 


4 
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4 


16 


2.9 


0.064 


0.0024 


0.012 


0.065 


0.0048 


0.018 
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35 
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Free 


6 
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4.1 


0.017 


0.0015 


0.016 


0.056 


0.0018 


0.018 
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35 
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Tied 
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32 
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0.099 


0.0060 


0.028 


0.122 


0.0055 


0.026 
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45 
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Free 


10 


16 


5.3 


0.023 


0.0016 


0.025 


0.022 


0.0019 


0.025 
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45 
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4 


Tied 


13 


32 


5.5 


0.175 


0.0086 


0.063 


0.147 


0.0103 


0.070 



''Characteristics of test systems used to generate conformation ensembles from which reference CPs were estimated. 
b Results of ensemble recovery procedure applied to reference CPs. 

C RMSD between recovered and reference values of restraint spring constants (k), CPs (p) and mean inter-bead distances (d), achieved when using a 
general linear model (Equation 9) with the specified number of parameters per spring constant. 
Label used to identify test system in Figure 3. 
e Number of beads in the chain. 

'Number of restraints used to induce the loops in the bead chain. 
g Number and type of induced loops. 
h Number of restraints found by peak detection algorithm. 
'Number of trials performed to select the optimal ensemble. 

J Average computation time per trial in hours when performing each trial with n+\ parameters on one core of a 2.2-GHz AMD Opteron Processor 
2427. 




Figure 3. Energy-minimized conformations of the test systems used to 
generate reference CPs for validating the proposed computational 
method. The systems are labeled as in Table 2. Images were generated 
using UCSF Chimera (48). 



steps, during which we collected one bead-chain conform- 
ation every 100 steps. From the collected conformations, 
we estimated p*, using Equation 8. The CPs estimated for 
a chain of 45 beads with four free loops and four tied 
loops are represented as heat maps in Figure 4a. Also 
highlighted are the locations of bead pairs (i, j) that 
were connected by harmonic restraints to induce the for- 
mation of loops or to tie the loops. 

These heat maps qualitatively confirm the intuition that 
p*j for beads connected by restraints and for nearby beads 
along the chain should be greater than the background 
CP. These maps, however, also reveal enhanced CPs for 
pairs of beads that were not directly connected by 
harmonic restraints and that were relatively distant 
along the chain from other restrained beads. A similar 
phenomenon was observed for the chain with 35 beads, 
but not for the chain with 25 beads (data not shown). 
Thus, an enhanced probability of contact between two 
beads in the chain is not always due to an external force 
directly pulling those beads toward each other. These 
results underscore the complexity of interactions that 
can arise even for a chain of only 35 beads, when such a 
chain is subjected to looping constraints. 



Relation between CPs and mean inter-head distances 

Simulating the above test systems to validate our compu- 
tational method also provided an opportunity to investi- 
gate the behavior of chromatin assumed in previous 
related works. In particular, to infer the 3D conformation 
of chromatin from experimentally measured CPs, previous 
studies have assumed that the mean spatial distance d 
between two DNA fragments can be deduced from their 
CP p through a simple functional relation. For example, 
the power law p oc d a has been used with exponents 
a = -1.0 (27,28) and a = -2.0 (30). Alternatively, expo- 
nential decay (29) and logarithmic types of relations (12) 
have also been used. To assess whether a simple relation- 
ship between mean inter-bead distance, 

4 = 77X>>/> ( 16 ) 

c i=l 

and corresponding CP p*, does hold for our test systems, 
we obtained df, from the same conformation ensembles 
that were used to determine p*,. 

First, however, we analyzed the results from simulations 
of bead chains that lacked harmonic restraints. A plot of 
p*, against df-, for 1 < i < j < N, obtained from simulating 
an unrestrained chain of 45 beads, shows that, in the 
absence of constraints inducing loop_ formation, the CPs 
follow a clear trend with a peak at dy « 8a (Figure 5a). 
We observed similar trends for chains with 35 and 25 
beads (data not shown). 

As p*. does not vary significantly among pairs_of beads 
separated by similar mean spatial distances d*y or by 
similar loop lengths j - i (Figure 5a), it is reasonable to 
estimate looping probabilities by averaging p* n over 
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constant values of j - i (Figure 5a, inset). We found that 
such looping probabilities for an unrestrained chain of 45 
beads approximately follow the trend predicted by theory 
for worm-like chains with non-zero persistence length and 
non-zero contact distance (26), thus confirming that our 
simulations can reproduce the behavior of such chains. 
Furthermore, noting a monotonic relation between p*. 
and d*i for d- - > IOct, we also fitted the power 
law p*, oc (d*j) a to our simulation data for the unrestrained 
chain, and we obtained a » —2.2, which approximately 
agrees with the value a = —2.0 reported in (30). These 



4 Free loops 4 Tied loops 




Figure 4. Heat maps representing (a) reference and (b) recovered CPs 
for a chain of 45 beads with (left) four free loops or (right) four tied 
loops. Free loops result from connecting loop end-beads with harmonic 
restraints (gray arcs in top-left schematic), while tied loops result from 
connecting middle beads across free loops (dotted arcs in top-right 
schematic). Blue circles on the maps identify pairs of beads that were 
restrained (a) when generating reference CPs and (b) when performing 
the ensemble recovery procedure. Test systems with two and three 
loops (Table 2) yielded a similarly good visual match between reference 
and recovered CP maps (data not shown). 



results suggest that, in the absence of harmonic restraints 
and for d\, sufficiently large, it may be appropriate to 
assume a simple monotonic relation between />*■ and dfj 
and_to use such relation for predicting approximate values 
of dg from known or measured values of p*-. 

We next analyzed the results from the simulations of the 
test systems, where specific beads were connected by re- 
straints as described above. In this case, the plots of p*, 
against dfj indicate that the addition of _ harmonic 
restraints complicates the relation between d*^ and p*. 
(Figure 5b and c) far beyond the clear trend' obtained 
from the simulations of the unrestrained chains. In par- 
ticular, when harmonic restraints are present, the CPs are 
overall greater than the corresponding values observed in 
the absence of restraints, and there appears to exist no 
simple law that relates p*j and d?-. In fact, different pairs 
of beads separated by similar mean spatial distances or by 
similar loop lengths yield CPs that differ significantly by 
up to four orders of magnitude. The observed variation of 
p*, with d-j for the chain with four free loops is bounded 
by power laws with exponents as different as —3 and —8 
(Figure 5b and c, dashed lines). Hence, for the test systems 
considered in the present study, assuming a simple func- 
tional relation and using such relation to calculate df, 
from p*j would introduce large fractional errors in the 
predicted values of df -, and those errors would increase 
with decreasing bead CPs. These results thus motivate 
the development of computational approaches that do 
not rely on calculating d\, from p*- but directly compare 
estimated CPs with reference CPs to infer a configuration 
ensemble. 

Method validation 

Ensemble recovery from reference CPs 
After obtaining the reference set of bead CPs p*, for each 
test system, we applied our computational method to 
recover an ensemble of conformations whose estimated 
CPs pij match the corresponding reference CPs. We 
began by selecting a few pairs of beads to be connected 
with harmonic restraints. This selection was performed in 



a - Free chain b - 4 free loops c - 4 tied loops 




Mean inter-bead distance 



Figure 5. Variation of bead CPs with mean inter-bead distance in reference ensembles for chain (a) without restraints, (b) with four free loops and 
(c) with four tied loops. Each point represents one of the possible bead pairs in the chain. Error bars are standard deviations over 10 independent 
simulations. The dashed line in (a) is a fit of the power law p oc (d) a , giving a = —2.23. Inset in (a): looping probability versus loop length in ensemble 
for chain without restraints. The curve in this inset is a fit of Equation 3 from (26). The dashed curves in (b) and (c) are power laws with exponents 
—8 and —3. Distances are in units of a. 
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Figure 6. Initial conformations used in eight trials of the ensemble 
recovery procedure for a chain with 35 beads and 6 restraints (third 
row in Table 2), shown before (top) and after (bottom) minimization of 
the potential energy, Equation 1. Images were generated using UCSF 
Chimera (48). 



an automated fashion by analyzing the reference CP maps 
with the peak detection algorithm described above. The 
algorithm successfully identified all of the bead pairs that 
were connected by harmonic restraints in the test systems 
used to generate /?*• (Supplementary Table SI). Moreover, 
for chains with 35 and 45 beads, the algorithm found 2, 3 or 
5 additional bead pairs that were not restrained in the test 
systems (Supplementary Table SI) but nevertheless gave 
rise to CPs enhanced above the background (Figure 4). 

We next adjusted the spring constants kij of the guessed 
restraints by performing up to 32 trials of our iterative 
restraint optimization procedure. Each trial used a differ- 
ent initial chain conformation (Figure 6) to start the BD 
simulations performed to estimate the CPs for the re- 
strained bead chain. 

From each trial, we obtained a different set of restraint 
spring constants together with the corresponding ensemble 
of chain conformations. For each such conformation 
ensemble, we used Equation 15 to calculate ^rmsd, the 
RMSD between the CPs pg estimated for that ensemble 
and the corresponding reference CPs p*- previously 
obtained for the test system under study. We found that 
.Prmsd varies among the trials of the ensemble recovery 
procedure for a given test system and that this variation 
increases with the complexity of the test system (Figure 7), 
indicating that, within the simulated time intervals, the re- 
strained bead chain tends to get trapped into local energy 
minima that depend on the initial chain conformation. 

For each recovered conformation ensemble, we also 
calculated the mean inter-bead distances for 
1 < i <j < N. Then, to compare quantitatively these 
mean inter-bead distances with the corresponding refer- 
ence quantities dy, we calculated the RMSD between 
the two sets of distances (30), 



'RMSD 



1 N(N- 1) 



E K/-4>/) 2 - 



(17) 



l<i<j<N 



We found that minimizing /?rmsd over the trials for a 
given test system yields the smallest, or a relatively small 
value for the RMSD of the mean inter-bead distances, 
^rmsd (Figure 7). These results indicate that minimizing 
/'rmsd relative to a set of reference CPs /?*• is an effective 
strategy for identifying a conformation ensemble that 
closely matches the mean inter-bead distances of the 
original conformation ensemble from which the p* t were 
estimated or measured. 
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Figure 7. Plots of RMSD of mean inter-bead distances (Equation 17) 
against RMSD of CPs (Equation 15) for all trials of the ensemble 
recovery procedure and for all tested systems. Each point represents 
RMSD values obtained from an ensemble of 10 6 conformations at the 
end of a particular trial. Inset: enlarged view of boxed area. Distances 
are in units of a. 

Therefore, to conclude our ensemble recovery proced- 
ure, for each test system, we selected the set of restraint 
spring constants and the corresponding conformation 
ensemble that minimized /?rmsd among all trials. 
Comparing the heat maps of the CPs estimated from re- 
covered and reference ensembles for chains with 45 beads 
(Figure 4a and b) shows a good qualitative agreement 
between the two ensembles. A similar good agreement 
was also observed for the simpler test systems (data not 
shown). This agreement is also apparent in plots of p t j 
against p*j (Figure 8a). 

Furthermore, the mean and standard deviation of the 
inter-bead distances in the recovered conformation ensem- 
bles are in excellent agreement with the correspond- 
ing quantities calculated for the reference ensembles 
(Figure 8b and c), confirming that our procedure success- 
fully recovered not only the average frequency of the 
various inter-bead interactions but also the average 
inter-bead distances and the extent to which these dis- 
tances fluctuate about the mean. 

To visualize the reference and recovered conformation 
ensembles, we uniformly extracted 100 conformations 
from each such ensemble and aligned those conformations 
on the beads that were restrained in the test system used to 
generated the reference ensemble. The resulting 3D repre- 
sentations of the reference and recovered conformation 
ensembles reveal large fluctuations in the positions of 
the loops (Figure 9). The same regions of space, 
however, tend to be occupied by corresponding loops in 
the reference and recovered ensembles, thus providing a 
visual confirmation of the similarity between the average 
spatial arrangements of the two ensembles. 

Simplified general linear model 

The good agreement that we observed between recovered 
and reference ensembles is in fact a consequence of success- 
fully optimizing the general linear model of Equation 9 
with the LMS algorithm. This optimization resulted in a 
good prediction of restraint spring constants from the 
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Figure 8. Comparison of (a,b) bead CPs, (c,d) mean inter-bead dis- 
tances and (e,f) standard deviation of inter-bead distance determined 
from optimal recovered ensemble to respective quantities determined 
from reference ensemble for a chain with 45 beads and (a,c,e) 4 free 
loops or (b,d,f) 4 tied loops. Each point represents one of the possible 
bead pairs in the chain. The dashed lines are plots of y = x, not linear 
fits. Distances are in units of a. Better correlations were observed for 
test systems with three and two loops (data not shown). 



reference CPs associated with the restrained bead pairs. In 
fact, as noted above, some bead pairs were chosen by the 
peak detection algorithm for restraining, even though they 
were not restrained in the test systems. During the ensemble 
recovery procedure, however, the spring constants for the 
restraints on these bead pairs decreased to small values 
relative to the spring constants restraining the other bead 
pairs (Supplementary Table SI). These results indicate that 
the ensemble recovery procedure correctly distinguished 
the pairs of beads that were directly connected by 
harmonic restraints in the reference conformation 
ensemble from those pairs that were not. In particular, 
the RMSD, 



l RMSD — 



Ik 



(18) 



between the spring constants kjj predicted during the 
ensemble recovery procedure and the corresponding 
value used for the restraints in the test systems was 
<6% oi k* t - for all such systems (Table 2). Thus, the pro- 
cedure successfully deduced approximate values of the 
underlying spring constants using only the knowledge of 
reference CPs. 

We asked whether a similarly good prediction of each 
restraint spring constant could be achieved with fewer 
than n + 1 non-zero elements per row in the matrix W in 
Equation 9, i.e. with fewer than n + 1 parameters per 
spring constant. To answer this question, we repeated 
the ensemble recovery procedure on all test systems, this 
time forcing all of the off-diagonal elements of W — except 
those in the last column — to be zero, thus effectively using 
only two parameters to predict each spring constant. This 
choice corresponds to assuming that each restraint spring 
constant kjj is linearly related only to the CP p,j estimated 
for the bead pair (i, j) restrained by that spring constant, 
i.e. kjj = WijPij + Cij. We found that the resulting RMSDs 
of the spring constants, CPs and mean inter-bead dis- 
tances did not differ appreciably from the corresponding 
values obtained by using n + 1 parameters per spring 
constant (Table 2). Therefore, for the test systems con- 
sidered in this study, it appears that the CP associated 
with each restrained bead pair depends primarily on the 
spring constant restraining that bead pair. This conclu- 
sion, however, may not hold for more complex systems, 
where the restraints might be less uniformly distributed 
among the beads and might have more variable spring 
constants than the restraints we used in this study to 
generate the reference CPs. For more complex systems, 
using all parameters in the general linear model of 
Equation 9 may be necessary to achieve adequate 
accuracy in the prediction of spring constants from CPs. 

Computation time 

The majority of the computation time required by the 
proposed ensemble recovery procedure is consumed by 
the BD simulations. These simulations are needed to 
estimate the CPs either for adjusting the spring constants 
through the LMS algorithm or for selecting the optimal 
conformation ensemble through a comparison of /?rmsd 
values among the ensembles obtained from different initial 
conformations. The simulations must be sufficiently long 
to ensure that the variance of the CPs estimated using 
Equation 8 does not outweigh the variation in CP due 
to differences in spring constants and initial conform- 
ations. In our work with the test systems, obtaining CPs 
sufficiently precise to ensure rapid convergence of the 
LMS algorithm in all trials of the restraint optimization 
procedure required 5 x 10 6 steps per simulation. On the 
other hand, ensuring that the optimal ensemble selected 
from several trials of the procedure matches the diversity 
of conformations present in the corresponding reference 
ensemble required 10 8 simulation steps, i.e. the same 
number that was used to obtain each reference 
ensemble. The average computation time per trial was 
found to increase linearly with the number of beads 
(Table 2). 
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Figure 9. Spatial representation of reference (left) and recovered (right) 
conformation ensembles for bead chains of 45 beads with (a) four free 
loops and (b) four tied loops. Each depicted ensemble consists of 100 
conformations extracted at equal intervals from 10 s steps of BD simu- 
lation and aligned on the beads that were connected by harmonic re- 
straints in the simulations used to generate the reference ensembles. The 
coloring order along each chain is red, yellow, green, cyan and blue. 
Images were generated using UCSF Chimera (48). 



CONCLUSION 

We have developed a computational approach to recover 
chromatin conformation ensembles from a set of reference 
CPs. The overall strategy of this approach consists of 
comparing the given set of reference CPs to a set of CPs 
obtained from simulations of a restrained bead-chain 
polymer model of chromatin. The results of this compari- 
son are used iteratively to adjust the parameters of the 
polymer model so that, after a sufficient number of iter- 
ations and trials, an optimal conformation ensemble is 
obtained whose CPs closely match the corresponding ref- 
erence probabilities. We have validated this procedure by 
using reference data sets obtained from simulations of six 
test systems of increasing complexity. For all such 
systems, the procedure yielded a conformation ensemble 
whose CPs, mean inter-bead distances and standard devi- 
ation of inter-bead distances all agree very closely with the 
corresponding reference quantities. The most complex test 
system that we considered was a chain of 45 beads, equiva- 
lent to roughly 1 35-270 kb, containing four tied loops 
(Figure 3f). Although this system is much smaller than 
the genomic loci typically investigated in 3C-based experi- 
ments, it does provide initial support to the validity of the 
proposed computational approach, which can already be 
used to investigate the spatial organization of small 
genomic regions. 

To enable efficient and accurate analysis of experimen- 
tal data sets obtained from large genomic loci, entire 
chromosomes or even entire genomes, the proposed com- 
putational approach will require additional improvement 
and validation. For example, whereas the present 
approach estimates CPs for beads representing fragments 



of equal lengths, 3C-based experiments typically provide 
reference CPs for fragments of various lengths. This 
mismatch could be overcome by mapping the experimen- 
tal fragments onto the bead-chain contour and by 
estimating CPs for pairs of mapped fragments, rather 
than for pairs of beads. Another issue is computational 
effort. Although the procedure we described lends itself to 
parallelization, with each trial executing on a separate pro- 
cessor core, it may nevertheless become too demanding for 
large genomic loci. Computational effort could be lowered 
by improving the efficiency of conformation sampling, for 
example through Monte Carlo simulations, and by 
avoiding Equation 8 in the estimation of CPs, for 
example by inferring inter-bead distance distributions 
from sample means and higher moments of dy. Finally, 
further validation of the procedure will require not only 
the simulation of larger and more complex test systems 
but also the availability of experimental data sets that 
include both 3C and FISH measurements on the same 
genomic region. Applications of our method to the 
analysis of experimental data and to the study of specific 
phenomena, such as gene clustering (31), are important 
issues that will be addressed in the future. 
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APPENDIX 

THE LMS ALGORITHM 

The LMS algorithm (45,46) has found numerous appli- 
cations in the field of adaptive signal processing, 
including adaptive system identification, adaptive 
inverse modeling, adaptive control and adaptive interfer- 
ence canceling (47). This algorithm was developed to 
optimize iteratively and dynamically the weights of a 
digital filter structure, known as ALC, that performs 
the dot product y k = w k x k , where y k is the ALC 
output, w/ f is a vector containing n + 1 adjustable 
weights, X£ is a vector containing n+ 1 inputs and k is 
the current time step for the inputs, weights and output. 
The choice of the inputs in x k and the role of the output 
yk depend on the specific application of the ALC. All 
applications, however, include a desired signal d k and 
require the adjustment of at each time step k so 
that the output y k is, on average, as close as possible 
to dk or, equivalently, so that the magnitude of the error 



£/ f = dk - }'/< = d k - xJw /o 



(Al) 



averaged over a long interval of k, is as small as possible. 
The degree to which the ALC meets this requirement can 
be quantified, as a function of w&, by defining the quad- 
ratic performance surface / = E[s k ], where the expected 



value E[ ] is taken over the time step k. Hence, the require- 
ment to achieve optimality of the ALC is that the weight 
vector Wfc be adjusted at each time step k to minimize x- 
The LMS algorithm addresses this requirement by using a 
variant of the steepest descent algorithm. This variant 
replaces the gradient V/ of the quadratic performance 
surface with a simpler estimate obtained at time step k 
directly from e|, i.e. 



Vx » Vs? = 2s /f Vs A - 



r_3 3_ 



-2e k x k , 



(A2) 

is the gradient operator 



where V 

with respect L to the" components of the weight vector w*. 
The gradient estimate Ve| is then used to c< 
improved weight vector from the current one, 



Wfe+i = w /f - /iVeI = wt k +2fiE k x k , 



(A3) 



where ^ > 0 is a gain factor that determines the size of the 
step along the negative gradient estimate. A small value of 
fi causes slow convergence, whereas too large a value of fi 
causes instability of the algorithm. It has been shown (47) 
that the LMS algorithm is stable for /x < l/tr[R], where 
R = E[x k x[] is the input correlation matrix. The strengths 
of the LMS algorithm are its simplicity, robustness and 
relatively rapid convergence despite the presence of noise 
in the input x k and desired signal d k . 



